City-Identification on Flickr Videos Using Acoustic Features

نویسندگان

  • Howard Lei
  • Jaeyoung Choi
  • Gerald Friedland
چکیده

This article presents an approach that utilizes audio to discriminate the city of origin of consumerproduced videos – a task that is hard to imagine even for humans. Using a sub-set of the MediaEval Placing Task's Flickr video set, we conducted an experiment with a setup similar to a typical NIST speaker recognition evaluation run. Our assumption is that the audio within the same city might be matched in various ways, e.g., language, typical environmental acoustics, etc., without a single outstanding feature being absolutely indicative. Using the NIST speaker recognition framework, a set of 18 cities across the world are used as targets, and Gaussian Mixture Models are trained on all targets. Audio from videos of a test set is scored against each of the targets, and a set of scores is obtained for pairs of test set files and target city models. The Equal Error Rate (EER), which is obtained at a scoring threshold where the number of false alarms equals the misses, is used as the performance measure of our system. We obtain an EER of 32.3% on a test set with no common users in the training set. We obtain a minimum EER of 22.1% on a test set with common users in the training set. The experiments show the feasibility of using implicit audio cues (as opposed to building explicit detectors for individual cues) for location estimation of consumer-produced “from-the-wild” videos. Since audio is likely complementary to other modalities useful for the task, such as video or metadata, the presented results can be used in combination with results from other modalities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features

We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumerproduced videos “from-the-wild.” Eighteen cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verificat...

متن کامل

Persona Linking: Matching Uploaders of Videos Across Accounts

This article presents an approach to link the uploaders of videos based on the audio track of the videos. Using a subset of the MediaEval [10] Placing Task's Flickr video set, which is labeled with the uploader's name, we conducted an experiment with a similar setup as a typical NIST speaker identification evaluation run. Based on the assumption that the audio might be matched in various ways (...

متن کامل

Exploiting Social Links for Event Identification in Social Media

We explore the use of social links (e.g., comment and authorship connections) for identifying events and their associated documents (e.g., photos, videos) in social media sites. To understand the potential benefits of using social links for this task, we analyze a network of author comments associated with photographs in a large-scale Flickr data set. Our preliminary experiments, building on ba...

متن کامل

Geotagging Flickr Photos And Videos Using Language Models

This paper presents an experimental framework for the Placing tasks, both estimation and verification at MediaEval Benchmarking 2016. The proposed framework provides results for four runs first, using metadata (such as user tags and title of images and videos), second, using visual features extracted from the images (such as tamura), third, by using the textual and visual features together and ...

متن کامل

Automatic Geo-referencing of Flickr Videos

We present a hierarchical, multi-modal approach for geo-referencing Flickr videos. Our approach makes use of external resources to identify toponyms in the metadata and of visual features to identify similar content. We use a database of more than 3.6 million Flickr images to group them into geographical areas and to build a hierarchical model. First, the geographical boundaries extraction meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011